text analytic
An Efficient Classification Model for Cyber Text
Hossen, Md Sakhawat, Borshon, Md. Zashid Iqbal, Badrudduza, A. S. M.
The uprising of deep learning methodology and practice in recent years has brought about a severe consequence of increasing carbon footprint due to the insatiable demand for computational resources and power. The field of text analytics also experienced a massive transformation in this trend of monopolizing methodology. In this paper, the original TF-IDF algorithm has been modified, and Clement Term Frequency-Inverse Document Frequency (CTF-IDF) has been proposed for data preprocessing. This paper primarily discusses the effectiveness of classical machine learning techniques in text analytics with CTF-IDF and a faster IRLBA algorithm for dimensionality reduction. The introduction of both of these techniques in the conventional text analytics pipeline ensures a more efficient, faster, and less computationally intensive application when compared with deep learning methodology regarding carbon footprint, with minor compromise in accuracy. The experimental results also exhibit a manifold of reduction in time complexity and improvement of model accuracy for the classical machine learning methods discussed further in this paper.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > Bangladesh (0.04)
- (2 more...)
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
Lee, Sam Yu-Te, Ji, Chenyang, Wen, Shicheng, Huang, Lifu, Liu, Dongyu, Ma, Kwan-Liu
Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent agents. VIDEE instantiates a human-agent collaroration workflow consisting of three stages: (1) Decomposition, which incorporates a human-in-the-loop Monte-Carlo Tree Search algorithm to support generative reasoning with human feedback, (2) Execution, which generates an executable text analytics pipeline, and (3) Evaluation, which integrates LLM-based evaluation and visualizations to support user validation of execution results. We conduct two quantitative experiments to evaluate VIDEE's effectiveness and analyze common agent errors. A user study involving participants with varying levels of NLP and text analytics experience -- from none to expert -- demonstrates the system's usability and reveals distinct user behavior patterns. The findings identify design implications for human-agent collaboration, validate the practical utility of VIDEE for non-expert users, and inform future improvements to intelligent text analytics systems.
- North America > United States > California (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- (2 more...)
- Workflow (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.67)
- Personal > Interview (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Generative AI Takes a Statistics Exam: A Comparison of Performance between ChatGPT3.5, ChatGPT4, and ChatGPT4o-mini
Many believe that use of generative AI as a private tutor has the potential to shrink access and achievement gaps between students and schools with abundant resources versus those with fewer resources. Shrinking the gap is possible only if paid and free versions of the platforms perform with the same accuracy. In this experiment, we investigate the performance of GPT versions 3.5, 4.0, and 4o-mini on the same 16-question statistics exam given to a class of first-year graduate students. While we do not advocate using any generative AI platform to complete an exam, the use of exam questions allows us to explore aspects of ChatGPT's responses to typical questions that students might encounter in a statistics course. Results on accuracy indicate that GPT 3.5 would fail the exam, GPT4 would perform well, and GPT4o-mini would perform somewhere in between. While we acknowledge the existence of other Generative AI/LLMs, our discussion concerns only ChatGPT because it is the most widely used platform on college campuses at this time. We further investigate differences among the AI platforms in the answers for each problem using methods developed for text analytics, such as reading level evaluation and topic modeling. Results indicate that GPT3.5 and 4o-mini have characteristics that are more similar than either of them have with GPT4.
- North America > United States > Arkansas (0.04)
- Asia > Middle East > Jordan (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (0.93)
- Research Report > Experimental Study (0.69)
- Health & Medicine (1.00)
- Education > Educational Setting > Higher Education (1.00)
- Education > Curriculum > Subject-Specific Education (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
H-COAL: Human Correction of AI-Generated Labels for Biomedical Named Entity Recognition
Duan, Xiaojing, Lalor, John P.
With the rapid advancement of machine learning models for NLP tasks, collecting high-fidelity labels from AI models is a realistic possibility. Firms now make AI available to customers via predictions as a service (PaaS). This includes PaaS products for healthcare. It is unclear whether these labels can be used for training a local model without expensive annotation checking by in-house experts. In this work, we propose a new framework for Human Correction of AI-Generated Labels (H-COAL). By ranking AI-generated outputs, one can selectively correct labels and approach gold standard performance (100% human labeling) with significantly less human effort. We show that correcting 5% of labels can close the AI-human performance gap by up to 64% relative improvement, and correcting 20% of labels can close the performance gap by up to 86% relative improvement.
Application of Text Analytics in Public Service Co-Creation: Literature Review and Research Framework
Rizun, Nina, Revina, Aleksandra, Edelmann, Noella
The public sector faces several challenges, such as a number of external and internal demands for change, citizens' dissatisfaction and frustration with public sector organizations, that need to be addressed. An alternative to the traditional top-down development of public services is co-creation of public services. Co-creation promotes collaboration between stakeholders with the aim to create better public services and achieve public values. At the same time, data analytics has been fuelled by the availability of immense amounts of textual data. Whilst both co-creation and TA have been used in the private sector, we study existing works on the application of Text Analytics (TA) techniques on text data to support public service co-creation. We systematically review 75 of the 979 papers that focus directly or indirectly on the application of TA in the context of public service development. In our review, we analyze the TA techniques, the public service they support, public value outcomes, and the co-creation phase they are used in. Our findings indicate that the TA implementation for co-creation is still in its early stages and thus still limited. Our research framework promotes the concept and stimulates the strengthening of the role of Text Analytics techniques to support public sector organisations and their use of co-creation process. From policy-makers' and public administration managers' standpoints, our findings and the proposed research framework can be used as a guideline in developing a strategy for the designing co-created and user-centred public services.
- Europe > Denmark (0.14)
- Asia > China (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Government > E-government (0.47)
- Government > Regional Government > Europe Government (0.46)
5 Completely FREE Natural Language Processing Courses
Text Analytics 2: Visualizing Natural Language Processing is a practical course. There are 3 modules in this course. In the first module, you will learn Text Analytics and Human Cognition, Measuring Linguistic Similarity, Topic Modelling, etc. The next lesson will cover how to visualize text analytics. The last section of this course covers how to apply text analytics to New Fields.
Expanding AI technology for unstructured biomedical text beyond English
The health industry is embracing the power of big data, cloud computing, and clinical analytics, harnessing data to deliver insights that can improve care and efficiency. Still, unstructured text remains a challenge--made even more complex by barriers of language. Doctors' notes and other unstructured text are often left unreferenced, are hard to parse and learn from, and are difficult to extract insights from, which leads to missed opportunities for diagnosis and better care. Microsoft recognizes the need to enable healthcare organizations worldwide to gather insights from this data--for better, faster, and more personalized care, and to improve health equity. With Text Analytics for Health, a part of Azure Cognitive Services, healthcare organizations around the world can now extract meaningful insights from unstructured text in seven languages and process it in a way that enables clinical decision support like never before.
- South America > Brazil (0.06)
- Asia > Middle East > Israel (0.05)
- North America > United States (0.05)
Top 19 Data Science Interview Questions for Beginners - DataScienceCentral.com
Job interviews make everyone nervous. But this is what they are designed to do. It is the most common medium to assess a candidate's presence of mind and his/her ability to remain calm and composed in a tense situation. In order to ace the interview, you need to have in-depth knowledge of the role you are interviewing for and what is expected. Presence of mind and strong subject knowledge assumes added significance when you are preparing for a Data Scientist's interview as it is definitely going to test your capabilities.
Azure Bicep: Deploy a Cognitive Services container image for Text Analytics.
This article will review how to use Azure Bicep to deploy a Cognitive Services resource and an Azure Container Instances resource to create a container image that can be used for text analytics. Before you move forward, take a moment to read the below article that explains in detail the architecture and objectives. Let's analyze the Bicep template. Create a new file in your working directory and name it'main.bicep'. Note we declare two resources: the Azure Cognitive Service resource and the Azure Container Instance resource.
Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (NLP) Applications: Albrecht, Jens, Ramachandran, Sidharth, Winkler, Christian: 9781492074083: Amazon.com: Books
This book is intended to support data scientists and developers so they can quickly enter the area of text analytics and natural language processing. Thus, we put the focus on developing practical solutions that can serve as blueprints in your daily business. A blueprint, in our definition, is a best-practice solution for a common problem. It is a template that you can easily copy and adapt for reuse. For these blueprints we use production-ready Python frameworks for data analysis, natural language processing, and machine learning.